0%

(CVPR 2018) Two-stream convolutional networks for dynamic texture synthesis

Tesfaldet M, Brubaker M A, Derpanis K G. Two-stream convolutional networks for dynamic texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6703-6712.



1. Overview


1.1. Motivation

Two-stream hypothesis model the human visual cortex in terms of two pathways

  • ventral stream. involved with object recognition
  • dorsal stream. involved with motion processing

In this paper, it proposed two-stream model for dynamic texture synthesis



  1. object recognition (appearance). encapsulate the per-frame appearance, pre-trained for object recognition
  2. optical flow prediction (dynamic). model dynamics, pre-trained for optical flow prediction


  • combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures
  • first work to demonstrate this form for style transfer
  • Two General Approaches
    • non-parametric sampling
    • statistical parametric model
  • Gram Matrix
    • capture the style information, ignore the spatial location
    • [b, c, h, w]→ [b, c, hw] & [b, hw, c]→ [b, c, c]

1.3. Future Work

  • extent the idea of a factorized representation into feed-forward generative networks



2. Method


Synthesizing a dynamic texture is formulated as an optimization problem with the objective of matching the activation statistics.

2.1. Appearance Stream



  • N_l. the number of filter
  • M_l. the number of spatial location
  • t. time t


  • Average over the target frames (as groud-truth).
    • T. the number of target frames
    • k. spatial location
    • i, j. the index of filter


each single frame to be synthesised (as prediction).

  • The Loss Function



  • L_{app}. the number of layers used to compute Gram Matrices

  • T_{out}. the number of frames being generated in the output

2.2. Dynamic Stream



  • input. a pair of consecutive greyscale images



  • T-1. T frames group into (T-1) pairs



  • The Loss Function


2.3. Overall



  • memory increases as the frames grows
  • separate the sequence into sub-sequence
    initialize the first frame of a sub-sequence as the last frame from the previous sub-sequence and keep it fixed.



3. Experiments


3.1. w/o Dynamic Stream



3.2. Loss of Flow Decode Layer vs Concat Layer



  • concatenation layer activation is far more effective than the flow decode layer

3.3. Failure Example



  • fail to capture spatially-inconsistent dynamics
  • fail to capture textures with spatially-variant appearance